Automatic generation of phone sets and lexical transcriptions
نویسندگان
چکیده
Automatic Speech Recognition (ASR) systems that have even moderately large recognition vocabularies model these words as sequencesof subword units, or phonemes. The set of these phonemes, or the phoneset, forms the basic units that the ASR system is trained to classify. This set is usually small in size, consisting typically of about 40 phones for English. The ASR system uses a dictionary in which all the words in the system’s vocabulary are transcribed in terms of these phones. The phoneset and the dictionary are specific to a language and are designed manually by an expert. The performance of the ASR system is critically dependent on the accuarcy of the dictionary. In this paper we attempt to design the phoneset and the dictionary automatically, using only the training data and their transcriptions. In order to do this we jointly optimize the dictionary as well as the acoustic models for an evolving phoneset using a Maximum a posteriori (MAP) formulation for the optimization of the dictionary and a Maximum Likelihood (ML) formulation to optimize the acoustic models. Experimental results on the Resource Management (RM) corpus show that such an automatically derived phoneset results in recognition accuracies close to that obtained using a manually designed phoneset and dictionary.
منابع مشابه
Automatic generation of phonetic transcriptions for large speech corpora
We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملExtracting true speaker identities from transcriptions
Automatic speaker diarization generally produces a generic label such a spkr1 rather than the true identity of the speaker. Recently, two approaches based on lexical rules were proposed to extract the true identity of the speaker from the transcriptions of the audio recording without any a priori acoustic information: one uses n-gram, the other one uses semantic classification trees (SCT). The ...
متن کاملUse of Graphemic Lexicons for Spoken Language Assessment
Automatic systems for practice and exams are essential to support the growing worldwide demand for learning English as an additional language. Assessment of spontaneous spoken English is, however, currently limited in scope due to the difficulty of achieving sufficient automatic speech recognition (ASR) accuracy. ”Off-the-shelf” English ASR systems cannot model the exceptionally wide variety of...
متن کاملFaster time-aligned phonetic transcriptions through partial automation
A semi-automatic process for generating time-aligned transcriptions of speech data at the word and phone level is described. At each stage in the process, segment durations are estimated to generate approximate boundary markers, which are then corrected by hand. Corrections at one level are taken into account in the generation of boundaries for the next level, such that the error is reduced at ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000